AO-4042  505  CARNEOXE-ACLLON  UNIV  PITTSBUR6H  PA  DEPT  OF  COMPUTER  —ETC  F/8  9/2 

PARALLEL  EXECUTION  OF  A SEQUENCE  OF  TASKS  ON  AN  ASYNCHRONOUS  MU— ETC (U) 
JUN  77  e M BAUOCT*  R P BRENT*  H T KUN6  N00014-76-C-0370 

UNCLASSIFIED  NL 


END 

DATE 

FILMED 

■8^77 


DEPARTMENT 

of 

COMPUTER  SCIENCE 


D D C 


PARALLEL  EXECUTION  OF  A SEQUENCE  OF 
TASKS  ON  AN  ASYNCHRONOUS  MULTIPROCESSOR 

G.  M.  Baudet* 

R.  P.  Brent t 

H . T . Rung* 


June  1977 


★ 

Department  of  Computer  Science,  Carnegie-Mellon  University,  Pittsburgh, 
PA  152 13 
t 

Computer  Centre,  The  Australian  National  University,  Canberra,  A.C.T., 
Australia  2600 

This  research  is  supported  in  part  by  the  National  Science  Foundation 
under  Grant  MCS  75-222-55*  the  Office  of  Naval  Research  under  Con- 
tract N00014-76-C-0370,  NR  044-422,  and  a research  grant  from  the 
Institut  de  Recherche  d' Informatique  et  d' Automat ique  (IRIA) , France. 


ABSTRACT 


Given  a sequence  of  Casks  to  be  performed  serially,  a parallel  algo- 
rithm is  proposed  to  accelerate  the  execution  of  the  tasks  on  an  as5mchro- 
nous  multiprocessor  by  taking  advantage  of  fluctuations  in  the  execution 
times . 

A parallel  program  requiring  no  critical  section  is  given  to  implement 
the  algorithm  and  its  correctness  is  proved.  A spacewise  more  efficient 
implementation  is  also  given  but  requires  the  use  of  critical  sections. 

An  analysis  is  presented  for  both  implementations  to  estimate  the 
speed-up  achievable  with  the  parallel  algorithm.  When  the  execution  times 
are  exponentially  distributed,  and  no  critical  section  is  used,  the  algo- 
rithm with  k processes  yields  a speed-up  of  order 
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1.  INTRODUCTION 


We  are  interested  in  the  design  and  analysis  of  parallel  algorithms 
for  asynchronous  multiprocessors  such  as  C.mmp  [6].  For  any  given  task, 
the  task  execution  time  on  such  a system  is  dependent  upon  the  properties 
of  the  operating  system,  effects  of  other  users,  processor-memory  interfer- 
ence, and  many  other  factors.  As  a result,  it  is  necessary  to  assume  that 
task  execution  times  are  random  variables  rather  than  constants.  In  this 
paper  we  propose  a novel  way  of  using  asynchronous  multiprocessors,  which 
takes  advantage  of  fluctuations  in  task  execution  times.  We  shall  present 
our  result  as  a solution  to  the  problem  of  execution  a sequence  of  n tasks 

w ,w  under  the  following  conditions: 

I n 

Cl.  For  i =■  2,...,n,  task  w^  cannot  be  started  before  the  completion 
of  the  tasks  are  linearly  ordered). 

C2 . For  i * l,...,n,  no  parallelism  can  be  utilized  in  the  execution 
of  (i.e.,  we  are  not  allowed  to  decompose  a task). 

C3.  The  execution  time  of  a task  is  a random  variable  rather  than  a 

constant.  (This  condition  corresponds  to  the  asynchronous  nature 
of  the  multiprocessor.) 

We  will  view  a parallel  algorithm  for  asynchronous  multiprocessors  as 
a collection  of  asynchronous  processes  which  communicate  among  each  other 
through  the  use  of  global  variables.  Such  an  algorithm  will  be  defined  by 
giving  the  procedure  each  of  its  processes  executed  when  assigned  to  a pro- 
cessor. While  analyzing  the  algorithm,  we  shall  always  assume  that  a pro- 


cessor is  available  for  any  of  the  runnable  processes  of  the  algorithm. 
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(See  Kung  [3]  for  a general  discussion  of  asynchronous  parallel  algorithms.) 
In  Section  2 we  give  an  algorithm  which  uses  k 2 l asynchronous  processes 
to  solve  the  problem.  The  algorithm  is  interesting  because  at  most  one 
process  is  doing  useful  work  at  any  given  time.  Nevertheless,  by  taking 
advantage  of  condition  C3,  the  mean  execution  time  is  less  for  k > 1 than 
for  k * 1,  i.e.,  a speedup  is  achieved. 

As  an  example,  consider  the  computation  of  x , . . . ,x  defined  by 

I n 


‘i+1 


where  x.,x  , , . . . ,x  , are  given  and  i is  some  iteration  function.  Let  w.  , 

0-1  -d  1+1 

be  the  task  of  computing  i((x^, . . . ,x^  ^)  . Our  algorithm  could  be  used  to 
execute  tasks  Wj^,...,w^,  which  is  equivalent  to  evaluating  x^,...,x^. 

The  speed-up  ratio  S^(n)  of  a parallel  algorithm  using  k processes  is  defined 
in  Section  3,  and  some  preliminary  results  are  proved  there.  In  Section  4 


ve  give  programs  to  implement  our  algorithm  both  with  and  without  using 
critical  sections  and  prove  informally  their  correctness.  In  Section  5 we 
consider  the  implementation  without  critical  sections,  and  obtain  an  analy- 
tic expression  for  the  speed-up  under  certain  assumptions  (A1  and  A2  of 
Section  5)  . For  large  n and  k,  our  result  is  Sj^(n)  • 

In  Section  6 we  consider  the  implementation  which  uses  critical  sec- 
tions. Here  the  analysis  is  more  difficult,  and  we  can  obtain  analytic 
results  only  for  k ^ 2.  Some  conclusions  and  open  problems  are  stated  in 


Section  7. 
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2.  THE  ALGORITHM 

For  each  positive  integer  k,  we  define  an  algorithm  with  k processes 
for  executing  tasks  Wj^,...,w^  under  conditions  Cl  aid  C2  stated  in  the  pre- 
ceding section.  The  algorithm  is  specified  as  follows: 

Whenever  a process,  P,  is  ready  to  execute  a task, 

(i)  if  no  task  has  yet  been  completed  by  any  process,  process  P 
starts  executing  taskw^, 

(ii)  otherwise,  if  the  last  task  w^  has  not  yet  been  completed  by 
any  process,  process  P starts  executing  a task  which  is  un- 
finished and  ready  for  execution. 

For  simplicity,  we  shall  assume  that  no  two  tasks  are  completed  at  the  same 
time.  Then  due  to  the  linear  ordering  of  the  tasks,  (ii)  defines  without 
ambiguity  a unique  task  to  be  executed  by  process  P. 

Let  tj^ , t^ , , . . . with  t^  < be  the  time  instants  of  completions 

of  tasks  by  the  processes.  The  diagram  in  Figure  2.1  illustrates  a possible 
scheduling  of  the  tasks  when  they  are  executed  by  the  algorithm  with  three 
processes . 
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Note  that  when  process  finishes  task  at  time  tg,  process  has 
already  completed  . Thus,  after  completes  w^ , it  starts  executing  w^ 

rather  than  w^.  Task  is  skipped  by  P^.  Similarly,  tasks  and  w^  are 
skipped  by  Pj^,  and  tasks  and  by  P2.  After  any  one  of  the  three  pro- 
cesses has  executed  six  tasks,  tasks  w through  w^  rather  than  tasks  w 

1.  o 1 

through  w,  are  completed.  A speed  up  has  been  achieved! 
o 

Observe  that  at  any  given  time  at  most  one  process  is  doing  work  use- 
ful for  later  computation.  With  respect  to  the  scheduling  given  by  Figure 
2.1,  the  time  intervals  on  which  processes  are  doing  useful  computations 
are  indicated  in  Figure  2.2. 


P 


2 


P 


3 


+ 


''7 


1 


Figure  2.2.  Time  intervals  on  which  processes  are  doing  useful  work. 


Thus  the  speed  up  is  not  achieved  by  sharing  work  among  processes  but  is 
achieved  by  taking  advantage  of  fluctuations  in  the  execution  times. 
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3.  A SPEED-UP  MEASURE 

Consider  the  algorithm  with  k processes  as  specified  in  the  preceding 
section.  The  algorithm  is  said  to  be  the  sequential  algorithm  if  k = 1 
and  to  be  a parallel  algorithm  if  k > 1.  Let 

X^(n)  =•  the  time  to  execute  tasks  '^n 

algorithm  with  k processes. 

Let  Tj^(n)  be  the  mean  of  the  random  variable  Tj^(n) . We  define  the  speed-up 
ratio  of  the  parallel  algorithm  with  k processes  to  be 


S^(n) 


Tj(n) 

^ ‘ 


For  each  k and  for  each  execution  of  the  algorithm  with  k processes, 
we  define  Sj^  ^ to  be  the  time  instant  of  the  first  completion  of  task  w^  , 
and  define  “ 0.  For  example,  with  respect  to  the  scheduling  of 

Figure  2.1,  with  k ■ 3,  we  have 


s 


3.1 


■5’  3,A 


3,5 


3,6 


•12’  3,7 


■13’ 


The  following  theorem  describes  the  relation  between  fs,  } and  ft  } 

k,i  i’ 

in  terms  of  the  scheduling  of  the  tasks.  This  theorem  is  important  in  sec- 
tions 5 and  6 for  computing  speed-up  ratios. 
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Theorem  3 . 1 

Suppose  that  s = t . with  1 ^ i i n-1.  Then  s,  . i for  some 

1 s j s k if  and  only  if 


(a)  the  j processes  completing  tasks  at  times  t^, t^^, . . . ,t^j 
are  all  distinct,  and 

(b)  the  proc%ss  completing  task  at  time  t is  one  of  the 

j processes  mentioned  in  (a) . 


Proof. 

(»)  Suppose  that  some  process  P completes  two  tasks  at  times  ^nd 

for  0 ih<m  Sj-1.  Then,  since  at  time  task  w^  has  already  been 

completed,  the  task  completed  at  time  t.  by  process  P must  be  w . This 

J&Hn  1+ 1 

contradicts  the  fact  that  is  completed  for  the  first  time  at  time 

since  This  proves  (a). 

Let  P be  the  process  completing  task  for  the  first  time,  at  time 

t.  . Suppose  that  P does  not  complete  any  task  in  the  time  interval 
Af  J 

[t^, t^j  l].  Then  the  task  completed  by  P at  time  must  be  started 

before  time  t^.  But  at  any  time  before  t^,  task  w^  is  not  completed  yet. 
Hence  any  task  started  before  time  t ^ cannot  be  In  particular,  the 

task  completed  by  P at  time  ^ cannot  be  This  contradiction  pro/es 

(b). 


(**)  The  proof  is  omitted,  since  it  is  similar  to  the  one  above. 


For  i ■ l,2,...,n,  let  be  the  random  variable  representing  the  quantity 

s , - s,  . , . Then  since  T,  (n)  ■ s , we  have 
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(3.1)  Tj^(n)  = + Tj^(2)  + ...  + \(n). 

(3.1)  will  be  used  later  to  compute  Tj^(n) , which  is  needed  for  evaluating 
the  speed-up  ratio  Sj^(n) . 
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4.  PARALLEL  PROGRAMS  FOR  THE  ALGORITHM  AND  THEIR  CORRECTNESS 

We  give  two  programs  to  implement  the  algorithm  with  k processes: 
without  critical  sections  and  one  with  critical  sections. 


4.1.  A Program  without  Critical  Sections 


Program  A: 

global  ( integer  or  real)  array  U[l:n]; 
global  Boolean  array  M[l:z>fl]; 

Initialization:  begin 

for  m ♦-  1 iH-1  ^ M[m]  <-  false; 

start  processes  Pj^,...,Pj^ 

end 


Process 


(4.1) 

(4.2) 

(4.3) 

(4.4) 

(4.5) 

(4.6) 


begin  Integer  m^ ; 

mj  1; 

while  Mfra.l  do  m.  ♦-  m.+  l; 
j — J j 

while  mj  s n ^ 


C 


begin 

perform  task  w ; 


m 


j 


yrite  the  output  of  task  w on  U[m 

® j J 

M[mj  ] ♦-  true; 

while  M[mj]  ^ mj  *-  mj+1 

end 


) 


one 


end 
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Assume  that  the  tasks  are  not  allowed  to  alter  the  array  M and  inte- 
gers We  shall  prove  that  Program  A is  correct  in  the  following  sense: 

PI.  For  m = 2,...,n,  task  w is  executed  only  if  task  w has  been  finished 

m n- 1 

and  its  output  has  been  written  on 

P2 . For  j = l,...,k,  process  P^  can  execute  the  loops  at  (4.1),  (4.2)  and 
(4.6)  at  most  n times. 

P3.  All  the  tasks  w, , . . . ,w  will  have  been  completed  at  the  time  when  anv 

1 m 

one  of  the  processes  P^,...,Pj^  terminates  its  execution. 

Property  P2  guarantees  that  the  program  will  terminate.  (Note  that 
there  is  no  possibility  of  deadlocks  in  the  program.)  Property  PI  ensures 
that  the  linear  ordering  requirement  of  the  executions  of  the  tasks  is  main- 
tained, and  property  P3  implies  that  when  the  program  terminates  all  the 
tasks  are  completed. 


Lemma  4 . 1 

(i)  For  m = l,...,n,  if  M[m]  is  set  to  true,  it  remains  true  afterwards, 
(ii)  After  being  initialized  to  false , M[n+1]  is  never  modified. 


Proof 

After  initialization,  M c’n  only  be  modified  through  statement  (4.5) 
executed  by  some  process  P . But  when  entering  the  main  while- loop  (starting 

J 

at  (4.2)),  fflj  satisfies  the  condition  mj  ^ n and  is  not  modified  before  the 
execution  of  (4.5).  Therefore  M[n+1]  can  never  be  modified.  ■ 


Lenma  4.2 


For  j ■ l,...,k,  if  m^  has  the  value  m 2 2,  then  M[m-1]  is  true. 
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Proo  f 

Suppose  that  m.  = m with  m S 2 at  time  t.  If  m.  was  incremented  by 
J J 

one  to  the  value  m inside  the  while  statement  (4.1)  or  (4.6),  then  the 

test  of  M[ra.]  being  tnae  with  m.  = m- 1 must  have  been  satisfied.  Hence 
J J 

M[m-1]  was  true  at  some  time  before  t.  Thus,  by  Lemma  4.1  M[m-1]  is 
true  at  time  t.  ■ 


Lemma  4 . 3 

For  m = 2,...,n,  if  M[m]  is  true,  then  M[m-1]  is  true. 

Proof 

Suppose  that  M[m]  is  true . Then  M[m]  must  have  been  assigned  to  true 
through  instruction  (4.5)  by  some  process  P^  with  m^  having  the  value  m. 
Therefore,  by  Lemma  4.2,  M[m-1]  is  true . ■ 


Lemma  4 .4 

For  m =•  l,...,n,  if  M[m]  is  true , then  task  w^  is  completed  and  its 
output  is  on  U[m]. 


Proof 


Suppose  that  M[m]  is  true . Then  M[m]  must  have  been  assigned  to  true 

through  instruction  (4.5)  by  some  process  P^  with  fflj  having  the  value  m. 

Since  P^  executes  instruction  (4.5)  only  after  the  completion  of  task  w^ 

and  since  m.  is  not  modified  in  between,  we  conclude  that  task  w is 
j m 


completed. 
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We  are  now  able  to  prove  the  following  theorem. 
Theorem  4 . 1 

Program  A satisfies  properties  PI,  P2  and  P3 . 


Proof 

1.  Suppose  that  process  P^  is  executing  w^  with  m = ^ 2.  Then  by  Lemma 

4.2  M[m-1]  is  true,  and  hence  by  Lemma  4.4  is  completed  and  its 

output  is  on  U[m-1].  We  conclude  that  program  A satisfies  PI. 

2.  Property  P2  follows  from  (ii)  of  Lemma  4.1,  since  m^  is  incremented  by 
one  in  each  execution  of  a loop. 


3.  Suppose  that  a process,  say  process  P^ , terminates.  This  happens  only 
when  = m+1.  Thus  by  Lemma  4.2  M[n]  is  true  and  hence  by  Lemma  4.3 

M[m]  is  true  for  all  m = l,...,n.  Therefore  by  Lemma  4.4  all  tasks  are 
completed.  We  have  shown  that  Program  A satisfies  property  P3 . I 


Program  A is  very  reliable  in  the  following  sense.  Property  P3  implies 
that,  even  if  some  processes  fail  (for  reasons  external  to  the  algorithm: 
e.g.,  "crash"  of  the  processors  executing  the  processes)  the  program  may 
still  continue  executing  tasks  and  eventually  complete  all  the  tasks  provided 
that  there  remains  at  least  one  active  process.  We  shall  not  pursue  this 
reliability  issue  any  further  in  this  paper,  though  we  believe  it  is 
important. 


4.2.  A Program  with  Critical  Sections 

For  problems  where  we  are  only  interested  in  the  output  of  the  last 
task  w^,  the  use  of  the  global  arrays  U[l:n]  and  M[l:n-t-l]  in  Program  A can 
be  avoided  at  the  expense  of  using  critical  sections. 
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We  shall  illustrate  the  idea  with  the  following  example.  Consider  the 
problem  of  generating  the  nth  iterate  x^  by  x^^  — :p(x^  given  the  initial 
iterate  x^.  Suppose  that  we  use  Program  A.  Then  corresponding  to  the 
global  array  U[l:n]  we  have  the  global  array  x[0:n]  where  x[ i ] keeps  the 
value  of  the  ith  iterate,  and  (4.3)  and  (4.4)  become 

x[mj]  *-  cp(x[raj-l])  . 

Note  that  we  only  need  x[n].  The  use  of  the  array  x[0:n]  is  wasteful  in 
space,  and  might  even  be  impractical  (e.g.,  when  n is  large  and  when  the 
elements  x[0] ,x[ 1] , . . . ,x[n]  are  themselves  vectoa  or  complicated  structures). 
The  following  program  solves  the  problem: 

Program  B: 

global  integer  m;  global  real  x; 

Initialization:  begin 

m - 1;  X - Xq; 

start  processes  Pj^,...,Pj^ 

end 


Process  Pj : 

begin  integer  m^ ; real  yj ; 

(4.7) 

{mj  *-  m;  yj  - x} 

while  s n do 

begin 

Xj  - 'P(yj) 

(4.8) 

[if  fflj  ■ m then  (m  *-  ®j+l;  x •-  Yj)}; 

(4.9) 

{mj  *-  m;  y^  - x] 

end 

end 
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It  is  crucial  to  assume  that  the  statements  enclosed  within  a pair  of 
curly  brackets  (lines  (4.7),  (4.8)  and  (4.9))  are  programmed  as  critical 
sections.  (As  a matter  of  fact,  the  two  lines  (4.8)  and  (4.9)  can  be  pro- 
grammed as  one  critical  section^  With  this  assumption  it  is  possible  to 
prove  the  correctness  of  the  above  program.  The  proof  is  based  on  the 
observation  that  the  global  variable  m is  a non- decreasing  function  of 
time  which  takes  on  all  integer  values  between  1 and  nfl.  The  proof  is 
relatively  easy  and  hence  is  omitted  here. 


Note  that,  as  was  already  mentioned,  x and  y^  may  represent  a large 

amount  of  data.  Hence  the  execution  of  x *-  y.  or  y . x inside  a critical 

J J 

section  may  take  a significant  amount  of  time.  Afte’*  presenting,  in  Sec- 
tion 5,  an  analysis  for  programs  which  do  not  have  critical  sections,  we 
will  give,  in  Section  6,  an  analysis  for  programs  which  do  have  critical 


sections . 
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5.  SPEED-UP  RATIOS  - IMPLEMENTATIONS  WITHOUT  CRITICAL  SECTIONS 

Let  t be  the  random  variable  representing  the  time  to  execute  task 
^ » J 

w by  process  P . In  this  and  the  next  section,  we  assume  that  the  t 
^ j i j 

for  i =•  l,...,n  and  1 =*  l,...,k,  are  independent  and  identically  distributed. 
The  assumption  is  reasonable  when  all  tasks  are  of  the  same  complexity  and 
executed  by  identical  processors.  We  shall  use  T to  denote  any  of  the  ran- 
dom variables  t and  use  t to  denote  the  mean  of  T. 
i f J 

It  is  easy  to  obtain  Tj^(n).  By  (3.1)  with  k = 1,  we  have 


Tj^(n)  = T^(l)  + Tj^(2)  + ...  + Tj^(n) 


Since,  in  this  case,  the  Tj^(i)  are  independent  and  identically  distributed 
with  mean  t,  we  deduce 


(5.1)  Tj^(n)  » nr. 

In  the  rest  of  the  paper,  in  order  to  evaluate  Tj^(n) , we  impose  the 
following  further  assumptions: 


Al.  All  processes  start  at  the  same  time  t =*  0.  (I.e.,  at  t^  all 

the  k processes  start  with  the  execution  of  task  w^.) 

A2 . The  random  variable  T is  exponentially  distributed  with  mean  t. 


We  observe  that  by  the  independence  of  the  t and  by  assumption  A2 

i 1 J 

the  quantities  Tj^(i) , i » l,...,n,  are  independent  random  variables.  It 
follows,  from  equation  (3.1)  and  assumption  A2,  that 


(5.2)  T^^(n)  - Tj^(l)  + ...  + \(n). 
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In  addition,  by  assumption  Al,  is  given  by  the  minimum  of  k 

random  variables  distributed  as  T.  Since  T is  exponentially  distributed, 
the  minimum  has  the  mean; 


(5.3)  Tj^(l) 


T 

k‘ 


We  now  consider  Tj^(i+1)  for  i * l,...,n-l.  Define  the  distribution 

probability  p.  , j = 1,2,...,  as  follows.  (We  use  here  the  same  notation 
J 

as  in  Section  3.)  Let  p be  the  probability  that  s . , ” given 

K j J iC  j 1 Ah  J 

that  s . = t,  for  some  Hence  for  1 ^ j ^ k,  p is  the  probability 

iC  ^ X Jc>  ^ I J 

that  conditions  (a)  and  (b)  of  Theorem  3.1  hold.  Using  the  same  argument  as 

used  in  the  proof  of  Theorem  3.1,  it  is  easy  to  show  that  p,  , =•  0 if  j > k. 

J 

In  addition,  assumption  A2  implies  that,  from  the  memory-less  property  of 


the  exponential  distribution,  p.  is  independent  of  i and  i.  We  have 

k»  J 


4J 

1 

with  probability  p, 

ic,  i 

Afl  ■ 

with  probability  Pj^  ^ 

(5.4)  Tj^(i+l)-*< 


[<Vl  - •••  <‘=Afk  - *^Afk-l^  probability  p^^^ 

Since  by  assumption  A2  the  random  variables  - t^,  i ■ 1,2,...,  are 

independent  (and  identically  distributed)  random  variables  with  mean  -^t,  we 
derive  from  (5.4)  that,  for  i » 1 n-1,  the  mean  of  Tj^(i+1)  is  given  by: 


(5.5)  Tj^(l+1) 


k k 

Pk,j  4-  JPk,j- 

j-i  j-i 
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By  (5.2),  (5.3)  and  (5.5),  we  obtain  that 


(5.6)  Tj^(n) 


k 

irci^  („-l)  2 


To  evaluate  • we  need  to  know  the  following  quantity; 


Lemma  5 . 1 


(5.7)  p,  , 


1kl 


k-’'*‘\k-j)'. 


, for  j * 1, . . . ,k. 


Proof 

We  first  observe  that,  by  assumption  A2,  for  i * 1,2,...,  any  one  of 
the  k processes  is  equally  likely  to  complete  a task  at  time  t^.  Suppose 
that  s » t.  and  s,  * Then  by  condition  (a)  of  Theorem  3.1, 

the  j processes  completing  tasks  at  time  1 different. 

This  occurs  with  probability 


(5.8)  ^ . ... 


k 


k'. 


k^(k-J)*. 


Moreover,  by  condition  (b)  of  Theorem  3.1,  the  process  completing  a task  at 

time  t^,  must  be  one  of  the  J processes  mentioned  above.  This  occurs  with 
W-j 

probability  Hence  the  probability  that  ^ ^ Af  j 


i 

k 


kl 


k-*(k-J)'. 


-17- 


The  problem  of  computing  the  leading  terms  in  the  asymptotic  series 
for  is  rather  difficult.  Fortunately,  some  known  results  can  be  used 
here.  Define 


> 

j=l 


kJ  (k-j)'. 


Lemma  5 .2 


- Q 


k '"k* 


Proof 


We  have 


j»i 

k 

\ IkL 


k-1 


1kl 


k-*  (k-j) '.  kJ‘^\k- j- 1)  •. 


\ jki  y g-Dki 
j-1  j.ikJ(k-J)’. 


kl 


jti  kJ(k.J)' 


The  leading  terms  in  the  asymptotic  series  for  are  known  (Knuth  [2  , Eq.  (25) 
In  §1.2.11.3]): 

Qjj  “ - 3 + “ T35k  288''^^^^  + 0(k  ) . 


Hence  by  (5.1),  (5.6)  and  Lemma  5.2,  we  have  the  following  theorem 


Theorem  5 . 1 


Using  k processes,  the  speed-up  ratio  is  given  by 


where 


0(k‘^) . 


Asymptotically,  when  both  n and  k are  large  we  obtain 
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In  the  following,  we  derive  a general  formula  for  evaluating  the  speed- 
up ratio  achievable  by  the  parallel  algorithm  with  two  processes  for  the 
case  when  F is  an  exponential  distribution  function  and  B is  a general  dis- 
tribution function. 

Observe  that  at  time  t^  when  a process  enters  the  critical  section,  the 
second  process  is  necessarily  performing  some  task  (possibly  just  starting 
a task).  Since  the  distribution  function  F is  exponential,  at  time  t^  the 
remaining  execution  time  for  the  task  performed  by  the  second  process  is 
distributed  according  to  the  same  distribution  function  F.  Therefore  the 
evolution  of  the  processes,  frorra  time  t^  on,  is  independent  of  the  past  for 
any  distribution  B.  In  particular,  the  random  variables  - t^^,  for 

i * 1,2 are  independent  and  identically  distributed,  and  the  same  holds 

for  the  random  variables  Xj^(i+1) , for  i * 1,2, . . . ,defined  in  Section  3. 

In  this  section,  let  Tj^(n)  and  T2(n)  denote  the  time  to  complete  task 
w^  and  the  subsequent  critical  section  by  the  sequential  algorithm  and  the 
parallel  algorithm  with  two  processes,  respectively.  Let  Tj^(n)  and  T^Cn) 
denote  their  means.  It  follows  from  the  above  discussion  that,  for  k = 1 
and  2 , we  have ; 

(6.1)  Tj^(n)  - Tj^(l)  + Tj^(2)  + ...  + ;^^(n)  + 0 

where  the  last  term,  0,  accounts  for  the  time  to  execute  the  last  critical 
section  (after  the  completion  of  task  • 

Consider  first  the  sequential  algorithm.  In  this  case,  we  simply  have 
Tj^(l)  ■ T,  and,  for  1 * 2,...,n,  r^(i)  * 0 + t.  Therefore,  by  equation  (6.1) 

(6.2)  T^(n)  - n(T  + 0). 
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(Here  we  ignore  the  fact  that  in  the  sequential  algorithm  the  critical  sec- 
tion can  be  shortened,  since  there  is  no  need  to  include  synchronization 
primitives . ) 

Consider  now  the  parallel  algorithm.  As  (5.3),  we  have 

(6.3)  T2(1)-jt. 

For  j * I and  2,  let  p.  be  the  probability  that  s . ^ given 

that  s^  ^ for  some  t.  As  in  Section  5,  by  Theorem  3.1,  we  obtain,  for 

i * 1,2,..., n— 1 , 


(6.4)  T2(i+D 


*^Afl  ■ 


with  probability  p^^. 


^ U-\  " ^ l!  probability  p^. 


We  have  already  mentioned  that  the  random  variables  t -t.,  £ = 1,2,..., 

bv  1 b 

are  independent  and  tdentially  distributed.  Let  p.  denote  their  mean.  It 
follows  from  (6.4)  that  the  mean  of  T2(i+1)  is  given  by 

(6.5)  T2(i+1)  “ Pj  • u + P2  ' 2p  - (2  - p^)p,. 


since  p^^  + P2  “ 1. 


The  following  lemma  establishes  the  values  of  ^ and  p^ . 


Lemma  6 . 1 


(6.6)  p - B + -r  B*(1/t), 


(6.7)  p^-|b*(1/t) 


where  B*  is  the  Laplace  transform  of  the  distribution  function  B. 
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i 

I 

I 


Proof 


We  consider  transitions  for  passing  from  time  t^  to  time  Up  to 

a permutation  of  the  processes,  there  are  three  possible  transitions  as 
defined  by  the  following  diagrams; 


A 


l' 


t. 

1 

ti o 


“^i+l 


where  the  notation 
Let  Hj(t) , j = 
place  and  that 

H^(t) 

H2(t) 


of  Figure  6.1  is  assumed. 

1,2  and  3,  be  the  probability  that  transition  Aj  takes 
- t^  i t.  We  have: 


= [1  - F(x)  ] r’*  b(y)f(x-y)dy  dx, 

“O  0 

* f(x)  b(y)[l  - F(x-y)]  dy  dx, 
0 0 


H^(t)  “ b(x)F(x)dx. 


But  we  observe  that  H(t)  “ Hj^(t)  + H^Ct)  + H^Ct)  is  the  distribution  function 
for  - t^  and  that  the  same  process  enters  the  critical  section  at  both  times 

t^  and  only  with  transition  Aj^.  Hence: 

n - r tdH(t)  - r [1  - H(t) ]dt, 

0 ■ 0 

p - J*  dH. (t)  - [1  - F(x)]  ]*  b(y)f(x-y)dy  dx, 

^00  0 

from  which  equations  (6.6)  and  (6.7)  follow  easily.  B 
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I 

1 

I 


15: 


By  collecting  the  preceding  results,  we  obtain  the  following  theorem: 
Theorem  6.1: 


S.,(n) 


n(T  + 3) 


(n-l)[2  - 7B*(1/t)][B  + |tB*(1/t)]  + + 9 


JL+§_ 


2 - |b*(1/t)  B + ^B*(1/t) 


+ 0(-) 


We  give  below  B*(l/r)  for  some  distribution  functions  B. 


(i)  B is  exponential  (with  parameter  l/ B) 


T + 9 • 


(ii)  B is  the  Dirac  function  at  the  point  9: 


B*(1/t)  = e"®'^'^. 


(iii)  B is  uniform  over  [a,b]: 


B*(1/t) 


-a/r  -b/r 
e - e 


(b-a) / T 


In  Figure  6.2,  we  have  plotted  the  asymptotic  speed-up  ratio  $2  as  a 
function  of  the  ratio  n - t/(t  + B)  for  the  three  distributions  mentioned 
above  (in  the  third  case,  a and  b have  been  chosen  as  9/2  and  39/2,  respec- 
tively) . 

When  rt  tends  to  0 (or  9 tends  to  infinity),  the  algorithm  approaches 
its  worst  performance,  since  the  evaluations  of  the  two  processes  tend  to 
be  exactly  Interleaved.  When  « “ 1 (or  9*0),  the  critical  section  is 
non-existent  and  we  have  the  results  of  Section  5. 
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We  observe  from  Figure  6.2  that  the  best  speed-up  ratio  is  always  ob- 
tained when  B is  an  exponential  distribution  (the  first  case).  We  also 
note  that  the  results  obtained  for  the  two  other  cases  are  very  close  to 
each  other  and  close  to  the  results  obtained  with  the  exponential  distribu- 
tion. This  suggests  that  the  results  obtained  with  the  exponential  distribu- 
tion could  be  used  as  approximations  to  results  obtained  with  other  distribu- 
tions . 

We  can  observe  from  Figure  6.2  that,  unlike  Che  implementation  without 
critical  section,  better  speed-up  is  not  necessarily  achieved  by  using  more 
processes,  though  we  assume  that  a processor  is  always  available  to  each 
process'.  More  precisely,  the  figure  indicates  that  (when  B is  an  exponen- 
tial distribution)  in  order  to  achieve  the  best  speed-up  when  two  processors 
are  available,  one  should  create  two  processes  when  a > 0.586,  but  only  one 
process  when  a ^ 0.586.  Similar  results  are  useful  in  practice,  since  they 
can  be  used  to  determine  the  optimal  number  of  processes  to  create  in  order 
to  minimize  the  overall  execution  time. 
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Spced-up  ratio 


7,  CONCLUSION  AND  OPEN  PROBLEMS 


In  recent  years,  research  in  parallel  algorithms  has  dealt  mostly 
with  synchronized  array  or  vector  processors  such  as  the  ILLIAC  IV  or  the 
CIX3  STAR,  and  there  are  very  few  results  on  the  design  and  analysis  of 
algorithms  for  asynchronous  multiprocessors.  In  this  paper,  we  have  pro- 
posed a novel  method  of  using  asynchronous  multiprocessors  which  takes  ad- 
vantage of  their  asynchronous  behavior.  We  have  also  presented  analytic 
techniques  to  evaluate  the  performance  of  an  asynchronous  algorithm  using 
the  method.  The  algorithm  is  expected  to  achieve  a large  speed-up  when 
the  fluctuations  in  the  task  execution  times  are  relatively  large.  More- 
over, as  noted  in  Section  4 , the  algorithm  has  a nice  reliability  property. 

The  idea  of  the  algorithm  may  also  be  used  to  construct  other  reliable 
algorithms . 

For  the  implementation  with  critical  sections  we  obtained  analytic 
results  for  two  processes.  The  results  show  that  the  parallel  algorithm 
using  two  processes  is  not  necessarily  faster  than  the  sequential  algo- 
rithm, because  of  the  critical  section  overheads  associated  with  the  paral- 
lel algorithm.  This  confirms  the  practical  experience  that  the  speed-up  ratio 
does  not  necessarily  increase  as  the  number  of  processes  increases.  It 
would  be  interesting  to  extend  our  analytic  results  for  more  than  two 
processes.  We  have  chosen  to  deal  with  a simple  problem  by  imposing  the 
condition  that  the  tasks  are  linearly  ordered.  An  interesting  extension 
would  be  to  consider  a set  of  tasks  (possibly  generated  dynamically) 
which  are  ordered  by  a directed  graph  (i.e.,  partially  rather  than  linearly 
ordered).  Another  interesting  extension  would  be  to  design  algorithms 
where  the  execution  of  a task  by  a process  may  be  interrupted  by  another 
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process.  We  expect  that  this  approach  would  result  in  more  efficient 
algorithms,  since  processes  which  are  not  doing  useful  work  can  be  inter- 
rupted. A careful  performance  analysis  including  the  additional  over- 
heads introduced  by  the  interruption  mechanism  is  needed  here. 

Finally,  we  note  that  the  results  of  this  paper  are  not  restricted 
to  multiprocessor  systems.  The  ideas  can  be  used  to  solve  any  problem  in 
operation  research  which  satisfies  conditions  similar  to  Cl,  C2  and  C3 . 


-28- 


references 


(Note:  References  [1,  4,  5]  are  not  cited  in  the  text  but  contain 

recent  results  on  asynchronous  multiprocessor  algorithms.) 

[1]  Baudet,  G.  M. , Asynchronous  Iterative  Methods  for  Multiprocessors, 
Computer  Science  Department  Report,  Carnegie-Mel Ion  University, 
November  1976.  To  appear  in  J.ACM. 

[2]  Knuth,  D.  E.,  The  Art  of  Computer  Programming.  Vol.  1:  Fundamental 

Algorithms . Addison-Wesley,  Reading,  MA,  second  addition,  1973. 

[3]  Rung,  H.  T. , "Synchronized  and  Asynchronous  Parallel  Algorithms  for 

Multiprocessors,"  Algorithms  and  Complexity:  New  Directions  and 

Recent  Results,  edited  by  J.  F.  Traub,  Academic  Press,  1976,  153-200. 

[4]  Rung,  H.  T.  and  S.  W.  Song,  A Parallel  Garbage  Collection  Algorithm 
and  Its  Correctness  Proof,  Computer  Science  Department  Report, 
Carnegie-Mel Ion  University,  June  1977. 

[5]  Robinson,  J.  T.,  Analysis  of  Asynchronous  Multiprocessor  Algorithms 
with  Application  to  Sorting,  Computer  Science  Department  Report, 
Carnegie-Mel Ion  University,  June  1977.  To  appear  in  Proc . 1977 
International  Conference  on  Parallel  Processing.  August  1977. 

Wulf,  W.  A.  and  C.  G.  Bell,  "C.mmp  - A Multi-Mini- Processor ,"  Proc . 
of  the  AFIPS  1972  Fall  Joint  Computer  Conference.  Vol.  41,  1972, 
765-777. 


[6] 


UNCLASSIFIED 


SeCUniTV  CLASSIFICATION  of  this  pace  rHTian  0ms  Cnifd) 


REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 
BEFORE  COMPLETING  FORM 

1 report  number 

|2.  GOVT  ACCESSION  NO. 


3.  RECIPIENT'S  CATALOG  NUMBER 

k.  title  (ana  Sub4*4*o^  - . ^ 

' gARALLEL  EXECUTION  OF  SEQUENCE  OF  ( 

' TASKS  ON  AN  ASYNCHRONOUS  MULTIPROCESSOR 

S.  TYPE  OF  report  a period  COVERED 
Interim 

6.  performing  org.  report  number 

7.  AUTMORf*^ 

G.  M,  Baudet,  Carnegie-Me lion  University 
R,  P.  Brentv  The  Australian  Ntl  University 

H.  T.  Rung/  Carnegie-Me lion  University 


performing  organization  name  and  address 
Carnegie-Mellon  University 
Department  of  Computer  Science 
Pittsburgh,  PA  15213 


II.  CONTROLLING  OFFICE  NAME  AND  ADDRESS 

Office  of  Naval  Research 
Arlington,  VA  22217 


« MONITORING  AGENCY  NAME  A AOORESS^fl  dllltrmtl  from  Conirollint  Otilet) 


• . CONTRACT  OR  grant  NUMBERr«7 

1 

N00014-76-C-0370/ 

ITR  044-422  ‘ 


\0.  PROGRAM  ELEMENT,  project.  TASK 
AREA  4 WORK  UNIT  NUMRERS 


12.  REPORT  DATE 


13.  NUMBER  OP  PACES 

31 


IS.  security  class.  (Oi  thl9  report) 

UNCLASSIFIED 


1S«.  DECLASSIFICATION/  down  GRADING 
SCHEDULE 


16.  Distribution  statement  <e/ t/ii*  Report; 


Approved  for  public  release;  distribution  unlimited. 


17.  distribution  statement  (of  tho  mbotrpot  ontofd  in  Bto^k  30,  if  dlHtont  from  Ropott) 


tf.  KEY  WORDS  ^Cwnflnww  on  tmwrom  oldo  It  ntooomy  mK  IBontity  kf  block  mmbm) 


20.  abstract  (ConlImM  on  rowmoo  oi^  U nocmoocrr  onb  Identity  bp  block  mmbot)  OlVen  a Sequence  Ot  tSSKS  CO 

be  performed  serially,  a parallel  algorithm  is  proposed  to  accelerate  the  execu- 
tion of  the  tasks  on  an  asynchronous  multiprocessor  by  taking  advantage  of 
fluctuations  in  the  execution  times.  A parallel  program  requiring  no  critical 
section  is  given  to  implement  the  algorithm  and  its  correctness  is  proved.  A 
spacewise  more  efficient  implementation  is  also  given  but  requires  the  use  of 
critical  sections.  An  analysis  is  presented  for  both  impleraentati''ns  to  esti- 
mate the  speed-up  achievable  with  the  parallel  algorithm.  When  the  execution 
times  are  exponentially  distributed,  and  no  critical  section  is  used,  the  algo- 
ithm  with  k processes  yields  a speed-up  of  order 


FORM 
I JAN  71 


1473  COITION  or  I NOV  ••  IS  OStOLCTC 

S/N  OI01-0I4-SS0I  I 


SCCUNITY  CLAtSiriCATlON  OF 


[ RAOC  (Whm>  0«ta 


